Elasticsearch Query DSL Syntax Notes
I previously mentioned that I would be grinding away at Elasticsearch until the end of this year, but I've changed my plans. After this post, I will likely only add one more article on its application in .NET before wrapping things up. I originally expected to finish by the end of October, but it dragged into November. At this rate, the next post will probably be delayed until the end of the year as well.
I had intended to cover Geo Queries and Aggregations as well, but I decided it would be better to split them up. Aggregations lean more towards statistical analysis and aren't strictly related to query syntax itself; as for Geo Queries, I'll put them on hold. I've been working on this post for too long and it's becoming a bit tedious, so I'll write a separate one when I have the time and the mood.
My weight loss progress stalled from September 16th to October 9th, and in October, I inexplicably fell into a state of world-weariness, wanting only to stay home, read novels, and scroll through short videos, with no desire to go out or use my brain. I don't know if I'll fall back into that state, and I have no idea when the next post will be finished.
I previously wrote a note on Elasticsearch QueryString syntax, which mainly introduced how to use simple query strings in the query_string field. That syntax is concise and intuitive, making it very suitable for quick tests or simple search requirements. However, in actual production environments, Query DSL (Domain Specific Language) is used more frequently. Query DSL is a JSON-structured query language provided by Elasticsearch, and its functionality is far more powerful and flexible than Query String. This article focuses on test results, supplemented by cross-verification with official documentation, to organize Query DSL syntax.
Test version: Elasticsearch 9.1.5
Query DSL vs Query String
Before we begin, let's briefly explain the advantages of Query DSL over Query String:
1. More Complete Functionality
Certain query features can only be implemented using Query DSL and are not supported by Query String:
- Nested Queries: When you need to preserve the relationships between fields within nested objects, you must use the
nestedquery in Query DSL. - Geospatial Queries: Such as
geo_distanceand other geographic query functions. - Custom Scoring: Use
function_scoreto customize the relevance scoring of documents. - Complex Boolean Logic Combinations: Flexibly combine
must,should,must_not,filter, and other conditions throughboolqueries.
2. Clearer Structure
Query String:
{
"query": {
"query_string": {
"query": "title:Elasticsearch AND status:published AND created_date:[2024-01-01 TO 2024-12-31]"
}
}
}Query DSL:
{
"query": {
"bool": {
"must": [
{ "match": { "title": "Elasticsearch" }},
{ "term": { "status": "published" }},
{ "range": {
"created_date": {
"gte": "2024-01-01",
"lte": "2024-12-31"
}
}
}
]
}
}
}Although Query DSL looks more verbose, the structure is clearer. Each query condition has a specific type and parameters, making it easier to maintain and debug. Furthermore, Query DSL provides clearer error messages, explicitly pointing out which field or parameter is problematic.
Common Query DSL Syntax
1. Match Query - Full-Text Search
Used for full-text search; it performs tokenization and relevance scoring.
Applicable Types:
- Text fields: Tokenized, supports all advanced parameters.
- Keyword fields: Not tokenized, exact match.
- Numeric/Date/Boolean fields: Exact match, does not support parameters like
fuzzinessoranalyzer.
Basic Query
{
"query": {
"match": {
"title": "Elasticsearch Tutorial"
}
}
}operator Parameter
Controls the logical relationship between multiple tokens.
OR (Default)
Returns results if any of the terms match:
{
"query": {
"match": {
"title": {
"query": "quick brown fox",
"operator": "OR"
}
}
}
}Effect: Documents containing any of the terms quick, brown, or fox will be returned.
AND
Must match all terms:
{
"query": {
"match": {
"title": {
"query": "quick brown fox",
"operator": "AND"
}
}
}
}Effect: Documents must contain all three terms: quick, brown, and fox.
minimum_should_match Parameter
Important: This parameter is only effective when operator = "OR".
Controls the minimum number of conditions that must be met.
Positive Integer (Absolute Quantity)
{
"query": {
"match": {
"content": {
"query": "quick brown fox jumps",
"minimum_should_match": 3
}
}
}
}Effect: At least 3 out of 4 terms must match.
Examples:
quick brown fox jumps✓ (All 4 match).quick brown fox dog✓ (3 match: quick brown fox).quick brown lazy dog✗ (Only 2 match: quick brown).the fox jumps high✗ (Only 2 match: fox jumps).
Negative Integer (Allowed Missing Quantity)
{
"query": {
"match": {
"content": {
"query": "quick brown fox jumps",
"minimum_should_match": -1
}
}
}
}Effect: At most 1 term can be missing, equivalent to requiring at least 3.
Examples:
quick brown fox jumps✓ (0 missing).quick brown fox dog✓ (1 missing: jumps).quick brown lazy dog✗ (2 missing: fox and jumps).
⚠️ Special Case: The minimum match count is guaranteed to be 1.
When setting -4 (missing count = total tokens) or -100% (100% missing), it will not return all data; at least 1 term must match to return results.
Examples (-4 or -100%):
quick dog✓ (1 match: quick).brown cat✓ (1 match: brown).lazy slow✗ (0 matches).
Percentage (Floor Rule)
{
"query": {
"match": {
"content": {
"query": "quick brown fox jumps",
"minimum_should_match": "75%"
}
}
}
}Effect: At least 75% must match, which is at least 3 out of 4 terms (4 × 0.75 = 3).
⚠️ Calculation Rule (Floor Rule):
75%: 4 × 0.75 = 3.0 → 3 terms.74%: 4 × 0.74 = 2.96 → Floored to 2 terms.50%: 4 × 0.50 = 2.0 → 2 terms.26%: 4 × 0.26 = 1.04 → Floored to 1 term.25%: 4 × 0.25 = 1.0 → 1 term.
Examples (75%):
quick brown fox jumps✓ (100% match).quick brown fox dog✓ (3 match, 75% met).quick brown dog cat✗ (Only 2 match, less than 75%).
Examples (74%):
quick brown dog cat✓ (2 match, 2.96 floored to 2).quick dog cat rat✗ (Only 1 match).
Negative Percentage (Floor Rule)
{
"query": {
"match": {
"content": {
"query": "quick brown fox jumps",
"minimum_should_match": "-25%"
}
}
}
}Effect: At most 25% missing, which is at most 1 term missing (4 × 0.25 = 1), equivalent to requiring at least 3.
⚠️ Calculation Rule (Floor Rule):
-25%: 4 × 0.25 = 1 → At most 1 missing, requires 3.-26%: 4 × 0.26 = 1.04 → Floored to 1, at most 1 missing, requires 3.-74%: 4 × 0.74 = 2.96 → Floored to 2, at most 2 missing, requires 2.-75%: 4 × 0.75 = 3 → At most 3 missing, requires 1.
Examples (-25%):
quick brown fox jumps✓ (0 missing).quick brown fox dog✓ (1 missing, meets at most 25% missing).quick brown dog cat✗ (2 missing, exceeds limit).
Examples (-74%):
quick brown dog cat✓ (2 match, at most 2 missing).quick dog cat rat✗ (Only 1 match, 3 missing).
Examples (-75%):
quick dog cat rat✓ (1 match, at most 3 missing).lazy slow fast dog✗ (0 matches).
Single Condition Combination (Advanced)
⚠️ Important: How to interpret single conditions.
Format: N<VALUE or N>VALUE.
N<VALUE: When token count ≤ N, use default rule (100%); when > N, apply VALUE rule.N>VALUE: When token count > N, use default rule (100%); when ≤ N, apply VALUE rule.
Example 1: 3<90%
{
"query": {
"match": {
"content": {
"query": "some long search query with many terms",
"minimum_should_match": "3<90%"
}
}
}
}Interpretation:
- When query is ≤ 3 tokens: Requires 100% match (default).
- When query is > 3 tokens: Requires 90% match.
Example (Assuming query "one two three four five", 5 terms):
one two three four five✓ (100% match, 5/5).one two three four dog✓ (80% match, but only 90% needed because 5 > 3).one two three dog cat✗ (Only 60% match, 3/5).
Example 2: 3<-1
{
"query": {
"match": {
"content": {
"query": "alpha beta gamma delta",
"minimum_should_match": "3<-1"
}
}
}
}Interpretation:
- When query is ≤ 3 tokens: Requires 100% match.
- When query is > 3 tokens: At most 1 missing.
Example (4 terms):
alpha beta gamma delta✓ (0 missing).alpha beta gamma dog✓ (1 missing: delta).alpha beta dog cat✗ (2 missing: gamma and delta).
Multiple Condition Combination (Advanced)
⚠️ Important: Multiple conditions are interpreted differently from single conditions.
Format: N1<VALUE1 N2<VALUE2 ....
Multiple conditions are interpreted as "ranges" rather than "less than":
- Before the first condition: Use default rule (100%).
- Between N1 and N2: Apply VALUE1.
- After N2: Apply VALUE2.
Example: 2<-25% 9<-3
{
"query": {
"match": {
"content": {
"query": "very long search query with lots of terms",
"minimum_should_match": "2<-25% 9<-3"
}
}
}
}⚠️ Correct Interpretation (Range approach):
- ≤ 2 tokens: 100% match (default).
- 3-9 tokens: At most 25% missing (applies first condition
-25%). - > 9 tokens: At most 3 missing (applies second condition
-3).
❌ Incorrect Interpretation (Understanding via single condition logic):
≤ 2: apply -25%(Incorrect!)> 9: apply -3(Incorrect!)
Example (Assuming query of 10 terms):
- Matches 10 terms ✓ (0 missing).
- Matches 7 terms ✓ (3 missing, meets > 9 rule).
- Matches 6 terms ✗ (4 missing, exceeds limit).
Example (Assuming query of 5 terms):
- Matches 5 terms ✓ (0% missing).
- Matches 4 terms ✓ (1 missing, 5 × 25% = 1.25 → floored to 1, meets at most 1 missing).
- Matches 3 terms ✗ (2 missing, exceeds limit).
fuzziness Parameter
Fuzzy matching, allows for spelling errors. Only applicable to text fields.
AUTO (Recommended)
{
"query": {
"match": {
"title": {
"query": "Elasticsearc",
"fuzziness": "AUTO"
}
}
}
}Effect: Automatically determines allowed edit distance based on term length.
Examples:
Elasticsearch✓ (1 char difference: h).Elasticsearc✓ (Exact match).Elasticserch✓ (1 char difference).Elastix✗ (Too much difference).
Fixed Edit Distance
{
"query": {
"match": {
"title": {
"query": "quikc brown",
"fuzziness": 1
}
}
}
}Effect: Allows at most 1 character difference (insertion, deletion, substitution).
Examples:
quick brown✓ (quikc → quick, 1 char difference).quikc brown✓ (Exact match).qukc brown✗ (2 char difference).qick brown✓ (1 char difference).
Related Parameters
{
"query": {
"match": {
"title": {
"query": "quikc brown fox",
"fuzziness": "AUTO",
"prefix_length": 2,
"max_expansions": 10,
"fuzzy_transpositions": true
}
}
}
}Parameter Explanation:
prefix_length: The first N characters must match exactly, default is0.max_expansions: Maximum number of candidate terms to expand during fuzzy matching, default is50.fuzzy_transpositions: Whether to allow adjacent character swaps (ab → ba), default istrue.
Example (prefix_length = 2):
quick brown fox✓ (Starts with qu, matches prefix).quikc brown fox✓ (Starts with qu, matches prefix).xuick brown fox✗ (First 2 characters xu do not match qu).
Example (max_expansions = 10):
Suppose the index contains these terms: quick, quit, quiz, quiet, quiche, quill, quirk, quack, queue, quartz, qualify, quarrel... (20+ similar terms).
When querying qui:
{
"query": {
"match": {
"title": {
"query": "qui",
"fuzziness": 1,
"max_expansions": 10
}
}
}
}Effect:
- Elasticsearch finds all similar terms with edit distance ≤ 1 (possibly 20+).
- Only the first 10 candidate terms are taken for searching (e.g.,
qui,quit,quiz,quiet,quick,quiche,quill,quirk,quack,queue). - Other candidates (like
quartz,qualify,quarrel...) are ignored.
Why limit this?
- Performance considerations: Expanding into dozens of candidates consumes significant computing resources, slowing down the query.
- Result quality: Too many candidates may include irrelevant results.
Example (fuzzy_transpositions = true):
qiuck✓ (ui ↔ iu, swapped).qukic✓ (ki ↔ ik, swapped).
Example (fuzzy_transpositions = false):
{
"query": {
"match": {
"title": {
"query": "qiuck",
"fuzziness": 1,
"fuzzy_transpositions": false
}
}
}
}qiuck✗ (ui ↔ iu swap not allowed, requires 2 edits: delete i, insert u).quick✓ (Requires only 1 edit: replace i → u).
Other Parameters
analyzer
Specifies the analyzer (defaults to the analyzer configured for the field):
{
"query": {
"match": {
"content": {
"query": "Quick Brown",
"analyzer": "standard"
}
}
}
}lenient
Controls how to handle cases where the query value does not match the field type, default is false.
Parameter Explanation:
false(default): Throws an error and the query fails if the type does not match.true: Ignores the query for that field if the type does not match; no error is thrown, but that field will have no matches.
Example 1: lenient = false (default)
{
"query": {
"match": {
"age": {
"query": "not a number"
}
}
}
}Effect:
- Because the
agefield is numeric and the query value"not a number"is text. - The query throws an error.
Example 2: lenient = true
{
"query": {
"match": {
"age": {
"query": "not a number",
"lenient": true
}
}
}
}Effect:
- The query does not throw an error.
- But because the type does not match, the field will not match any documents (equivalent to the condition being ignored).
- The query executes normally, just with no results.
boost
Adjusts the relevance score weight, default is 1.0:
{
"query": {
"match": {
"title": {
"query": "Elasticsearch",
"boost": 2.0
}
}
}
}zero_terms_query
How to handle cases where the query results in no tokens after analysis (becomes an empty query), default is none.
Parameter Explanation:
none(default): Returns no documents.all: Returns all documents (equivalent to match_all).
Example 1: Empty string query
{
"query": {
"match": {
"message": {
"query": "",
"zero_terms_query": "none" // or "all"
}
}
}
}Effect:
zero_terms_query: "none": Returns no documents.zero_terms_query: "all": Returns all documents.
Example 2: Stop filter removes all terms
Suppose the message field uses a stop filter containing to, be, or, not (requires extra configuration), when querying "to be or not to be":
{
"query": {
"match": {
"message": {
"query": "to be or not to be",
"zero_terms_query": "none" // or "all"
}
}
}
}Process:
- Original query:
"to be or not to be". - Stop filter removes all stop words, leaving 0 tokens (becomes an empty query).
zero_terms_query: "none": Returns no documents;zero_terms_query: "all": Returns all documents.
Use Cases:
zero_terms_query: "all": Search boxes that allow empty queries, or where users might only input stop words but still expect feedback.zero_terms_query: "none": Disallows empty queries (most default behaviors).
WARNING
zero_terms_query is only triggered when the query truly becomes empty.
If the query terms are not removed but simply cannot be found in the index, it will return 0 results normally rather than triggering zero_terms_query. For example, if the field does not have a stop filter configured, querying "to be or not to be" will not trigger zero_terms_query, but will search for those terms normally.
2. Multi Match Query - Multi-field Search
Searches for the same keyword across multiple fields.
{
"query": {
"multi_match": {
"query": "Elasticsearch",
"fields": ["title^3", "content", "tags"],
"type": "best_fields"
}
}
}Parameter Explanation:
fields: List of fields, the number after^represents the weight. Fields can use wildcards, e.g.,"title"and"*_name"will searchtitle,first_name,last_name, etc.type: Query type.
Parameter Support by Type
| Parameter | Description | best_fields | most_fields | cross_fields | phrase | phrase_prefix | bool_prefix |
|---|---|---|---|---|---|---|---|
fuzziness | Fuzzy match, allows spelling errors (supports AUTO, 0, 1, 2) | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ |
prefix_length | First N characters must match exactly (default 0) | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ |
max_expansions | Max candidate terms to expand during fuzzy match (default 50) | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
fuzzy_transpositions | Whether to allow adjacent character swaps (default true) | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ |
fuzzy_rewrite | Rewrite method for fuzzy queries | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ |
slop | Allowed term spacing for phrase queries | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ |
lenient Parameter
The lenient parameter is particularly useful in multi-field queries because different fields may have different data types.
Suppose the index has the following fields:
title(text)price(integer)
{
"query": {
"multi_match": {
"query": "not a number",
"fields": ["title", "price"],
"lenient": false
}
}
}Effect (lenient = false, default):
titlefield is text, can handle"not a number"normally.pricefield is integer, cannot handle"not a number".- The query throws an error, and the entire query fails.
{
"query": {
"multi_match": {
"query": "not a number",
"fields": ["title", "price"],
"lenient": true
}
}
}Effect (lenient = true):
titlefield searches"not a number"normally.pricefield is ignored due to type mismatch; no error is thrown.- The query executes normally, searching only in the
titlefield.
Query Type Explanation
To better illustrate the differences between various query types, we use the following test data:
Test Data:
// Document 1
{
"title": "brown fox jumps",
"subject": "quick animal",
"message": "The quick brown fox"
}
// Document 2
{
"title": "quick brown",
"subject": "fox hunting",
"message": "Guide to fox hunting"
}
// Document 3
{
"title": "fast animal",
"subject": "brown bear",
"message": "The brown bear is slow"
}best_fields (default)
Takes the score of the highest-scoring field, suitable for finding "best match in a single field".
{
"query": {
"multi_match": {
"query": "quick brown fox",
"type": "best_fields",
"fields": ["title", "subject", "message"],
"tie_breaker": 0.3
}
}
}Internal Execution Logic (equivalent to):
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "quick brown fox" }},
{ "match": { "subject": "quick brown fox" }},
{ "match": { "message": "quick brown fox" }}
],
"tie_breaker": 0.3
}
}
}Scoring Method:
- Takes the score of the highest-scoring field.
- If
tie_breakeris set, it becomes: Highest Score + (Other field scores × tie_breaker).
Query Result Analysis:
Assuming "quick brown fox" is queried, the base score for each field is as follows (actual scores are affected by BM25 algorithm, term frequency, document length, etc.):
| Document | title Score | subject Score | message Score | Final Score Calculation (tie_breaker=0.3) |
|---|---|---|---|---|
| Doc 1 | 1.5 (brown, fox) | 1.0 (quick) | 5.0 (quick, brown, fox) | 5.0 + (1.5 + 1.0) × 0.3 = 5.75 |
| Doc 2 | 3.0 (quick, brown) | 1.0 (fox) | 1.0 (fox) | 3.0 + (1.0 + 1.0) × 0.3 = 3.6 |
| Doc 3 | 0 | 1.0 (brown) | 1.0 (brown) | 1.0 + 1.0 × 0.3 = 1.3 |
Calculation Logic:
- Select the highest-scoring field as the base score.
- Multiply the scores of all other matching fields by the tie_breaker and sum them up.
- Formula:
Highest Score + (Sum of other field scores × tie_breaker).
Conclusion: Document 1 has the highest score because the message field contains all three terms and is the highest-scoring, while the other two fields also contribute.
most_fields
Combines the scores of all fields, suitable for "multiple similar fields" (e.g., different tokenization methods for the same content).
{
"query": {
"multi_match": {
"query": "quick brown fox",
"type": "most_fields",
"fields": ["title", "subject", "message"]
}
}
}Internal Execution Logic (equivalent to):
{
"query": {
"bool": {
"should": [
{ "match": { "title": "quick brown fox" }},
{ "match": { "subject": "quick brown fox" }},
{ "match": { "message": "quick brown fox" }}
]
}
}
}Scoring Method:
- Sums the scores of all fields.
Query Result Analysis:
| Document | title Score | subject Score | message Score | Final Score (Sum) |
|---|---|---|---|---|
| Doc 1 | 1.5 (brown, fox) | 1.0 (quick) | 5.0 (quick, brown, fox) | 1.5 + 1.0 + 5.0 = 7.5 |
| Doc 2 | 3.0 (quick, brown) | 1.0 (fox) | 1.0 (fox) | 3.0 + 1.0 + 1.0 = 5.0 |
| Doc 3 | 0 | 1.0 (brown) | 1.0 (brown) | 0 + 1.0 + 1.0 = 2.0 |
Conclusion: Document 1 has the highest score because it matches in multiple fields.
Difference from best_fields:
The main difference between best_fields and most_fields lies in the default value of tie_breaker:
best_fields: Defaulttie_breaker = 0.0(takes only the highest score).most_fields: Defaulttie_breaker = 1.0(sums all scores).
When both are set to the same tie_breaker value, the calculated scores will be the same.
cross_fields
Cross-field search, treats multiple fields as one large field, suitable for cases like names, addresses, etc., where matches need to span fields.
Test Data (Name Example):
// Document 1
{ "first_name": "Wing", "last_name": "Chou" }
// Document 2
{ "first_name": "Chou", "last_name": "Chen" }
// Document 3
{ "first_name": "John", "last_name": "Wing" }{
"query": {
"multi_match": {
"query": "Wing Chou",
"type": "cross_fields",
"fields": ["first_name", "last_name"],
"operator": "and"
}
}
}Execution Logic:
According to official documentation, cross_fields analyzes the query string into individual terms and then searches for each term in any of the fields, as if they were one large field.
+blended(terms:[first_name:wing, last_name:wing])
+blended(terms:[first_name:chou, last_name:chou])This means each term can be scattered across different fields, as long as each term appears in at least one field.
Query Result Analysis:
| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ✅ | Wing in first_name, Chou in last_name (scattered across different fields) |
| Doc 2 | ❌ | Only Chou matches, missing Wing |
| Doc 3 | ❌ | Only Wing matches, missing Chou |
WARNING
When the search_analyzer settings for fields are inconsistent (e.g., one field has an analyzer configured and another does not), the behavior of cross_fields changes. For example, the execution logic becomes:
((+first_name:wing +first_name:chou) | (+last_name:wing +last_name:chou))In this case, all terms must appear in the same field, rather than being scattered across different fields, behaving similarly to best_fields (but with different field ordering).
Additionally, combined_fields queries will fail if fields use different search_analyzers, so if you have custom analyzer requirements, you need to be particularly aware of this limitation.
Scoring Method:
- Blends term frequency statistics across all fields to avoid results being skewed by high term frequency in a single field.
tie_breakercan be used to adjust scoring behavior (default is 0.0).
phrase
Phrase query, terms must appear in order.
Test Data:
// Document 1
{ "title": "quick brown fox", "message": "The fox is quick" }
// Document 2
{ "title": "brown quick fox", "message": "quick brown fox jumps" }
// Document 3
{ "title": "fast brown fox", "message": "A brown and quick animal" }{
"query": {
"multi_match": {
"query": "quick brown fox",
"type": "phrase",
"fields": ["title", "message"]
}
}
}Internal Execution Logic (equivalent to):
{
"query": {
"dis_max": {
"queries": [
{ "match_phrase": { "title": "quick brown fox" }},
{ "match_phrase": { "message": "quick brown fox" }}
]
}
}
}Query Result Analysis:
| Document | title matches | message matches | Returns? |
|---|---|---|---|
| Doc 1 | ✅ (order correct) | ❌ (order wrong: "fox is quick") | ✅ |
| Doc 2 | ❌ (order wrong: "brown quick fox") | ✅ (order correct) | ✅ |
| Doc 3 | ❌ (extra "fast" in middle) | ❌ (terms scattered: "brown and quick") | ❌ |
Conclusion: Phrase queries require terms to appear adjacent and in order.
Using with slop parameter:
{
"query": {
"multi_match": {
"query": "quick brown fox",
"type": "phrase",
"fields": ["title", "message"],
"slop": 1
}
}
}Query Result Changes:
| Document | title matches | message matches | Returns? |
|---|---|---|---|
| Doc 1 | ✅ | ❌ (requires slop = 2) | ✅ |
| Doc 2 | ❌ (requires slop = 2) | ✅ | ✅ |
| Doc 3 | ✅ ("fast" counts as 1 interval) | ❌ (requires larger slop) | ✅ |
phrase_prefix
Phrase prefix query, the last term can be a prefix match.
Test Data:
// Document 1
{ "title": "quick brown fox", "message": "quick brown forest" }
// Document 2
{ "title": "quick brown food", "message": "quick brown" }
// Document 3
{ "title": "fast brown fox", "message": "quick blue forest" }{
"query": {
"multi_match": {
"query": "quick brown f",
"type": "phrase_prefix",
"fields": ["title", "message"]
}
}
}Internal Execution Logic (equivalent to):
{
"query": {
"dis_max": {
"queries": [
{ "match_phrase_prefix": { "title": "quick brown f" }},
{ "match_phrase_prefix": { "message": "quick brown f" }}
]
}
}
}Query Result Analysis:
| Document | title matches | message matches | Returns? |
|---|---|---|---|
| Doc 1 | ✅ (f prefix matches fox) | ✅ (f prefix matches forest) | ✅ |
| Doc 2 | ✅ (f prefix matches food) | ❌ (no term starting with f) | ✅ |
| Doc 3 | ❌ (missing "quick") | ❌ (missing "brown") | ❌ |
Conclusion: The first N-1 terms must match exactly and in order, the last term can be a prefix match.
bool_prefix
Boolean prefix query, the last term uses prefix matching, other terms use exact matching.
Test Data:
// Document 1
{ "title": "quick brown fox", "message": "forest animals" }
// Document 2
{ "title": "brown food quick", "message": "quick forest" }
// Document 3
{ "title": "fast fox", "message": "brown quick forest" }{
"query": {
"multi_match": {
"query": "quick brown f",
"type": "bool_prefix",
"fields": ["title", "message"]
}
}
}Scoring Method:
- Similar to
most_fields, but usesmatch_bool_prefixquery. - Supports fuzzy query parameters, but only effective for non-prefix terms.
Query Result Analysis:
| Document | title matches | message matches | Returns? | Explanation |
|---|---|---|---|---|
| Doc 1 | ✅ (quick, brown, f prefix) | ✅ (f prefix matches forest) | ✅ | All terms match |
| Doc 2 | ✅ (quick, brown, f prefix matches food) | ✅ (quick, f prefix matches forest) | ✅ | Term order doesn't matter |
| Doc 3 | ✅ (f prefix matches fox) | ✅ (brown, quick, f prefix matches forest) | ✅ | Terms can be scattered across fields |
Difference from phrase_prefix:
| Feature | phrase_prefix | bool_prefix |
|---|---|---|
| Term order | Must be in order | Order not required |
| Term position | Must be adjacent | Can be scattered |
| Use case | Exact phrase search | Flexible autocomplete |
Example:
Querying "quick brown f":
phrase_prefix: Must be in the order "quick brown f...".bool_prefix: Can be any order like "brown quick f..." or "f... brown quick".
3. Combined Fields Query - Cross-Field Term Search
The combined_fields query adopts a term-centric approach, treating multiple text fields as a single combined field for searching. It is particularly suitable for cases where query terms might be scattered across multiple fields, such as an article's title, abstract, and body.
Basic Query:
{
"query": {
"combined_fields": {
"query": "database systems",
"fields": ["title", "abstract", "body"],
"operator": "and"
}
}
}Test Data:
// Document 1
{
"title": "Database Management",
"abstract": "Modern systems overview",
"body": "Relational database concepts"
}
// Document 2
{
"title": "Information Systems",
"abstract": "Database architecture",
"body": "Design patterns"
}
// Document 3
{
"title": "NoSQL Solutions",
"abstract": "Alternative approaches",
"body": "Non-relational systems"
}Query Result Analysis:
When querying "database systems":
| Document | Matches? | Returns? | Explanation |
|---|---|---|---|
| Doc 1 | ✅ | ✅ | "database" in title and body, "systems" in abstract |
| Doc 2 | ✅ | ✅ | "database" in abstract, "systems" in title |
| Doc 3 | ✅ | ✅ | "systems" in body (if operator is "or") |
Main Parameters
fields (Required)
List of fields, supports wildcards. All fields must be of type text and use the same search analyzer.
{
"query": {
"combined_fields": {
"query": "quick search",
"fields": ["title^2", "content", "*_text"]
}
}
}boost
You can use the ^ symbol to set field weights (must be ≥ 1.0, can be a decimal), or use the boost parameter to adjust the weight of the entire query:
{
"query": {
"combined_fields": {
"query": "distributed consensus",
"fields": ["title^2", "body"],
"boost": 1.5
}
}
}Test Data:
// Document 1
{ "title": "Consensus Algorithms", "body": "Distributed systems basics" }
// Document 2
{ "title": "Network Protocols", "body": "Distributed consensus mechanisms" }Scoring Method:
- Document 1:
titlecontains "consensus" (weight × 2),bodycontains "distributed", overall score is higher. - Document 2: Both terms are in
body(no weight bonus), score is lower.
operator
Sets the logical relationship between terms, default is or.
or(default): Matches if any term matches.and: All terms must match.
{
"query": {
"combined_fields": {
"query": "database systems",
"fields": ["title", "abstract", "body"],
"operator": "and"
}
}
}minimum_should_match
Minimum number of matches, usage is the same as match query. Supports:
- Positive integer: Absolute quantity (e.g.,
3). - Negative integer: Allowed missing quantity (e.g.,
-1). - Percentage:
"75%"or"-25%". - Condition combination:
"3<90%"or"2<-25% 9<-3".
For detailed explanation, please refer to the minimum_should_match parameter in the "Match Query" section.
{
"query": {
"combined_fields": {
"query": "quick brown fox jumps",
"fields": ["title", "content"],
"minimum_should_match": "75%"
}
}
}zero_terms_query
How to handle cases where there are no tokens after analysis, default is none.
none(default): Returns no documents.all: Returns all documents.
For detailed explanation, please refer to the zero_terms_query parameter in the "Match Query" section.
auto_generate_synonyms_phrase_query
Whether to automatically create phrase queries for multi-term synonyms, default is true.
{
"query": {
"combined_fields": {
"query": "quick",
"fields": ["title", "body"],
"auto_generate_synonyms_phrase_query": true
}
}
}Effect: If "quick" has a synonym "fast running", a phrase query "fast running" will be automatically created.
WARNING
Using the synonym feature requires configuring a synonym filter in the field's search_analyzer. However, combined_fields requires all fields to use the same search_analyzer. If the analyzer settings for the fields are inconsistent, the query will fail. Therefore, when using this parameter, ensure all queried fields use the same synonym configuration.
Execution Logic
{
"query": {
"combined_fields": {
"query": "database systems",
"fields": ["title", "abstract"],
"operator": "and"
}
}
}Actual Execution Logic:
+(combined("database", fields:["title", "abstract"]))
+(combined("systems", fields:["title", "abstract"]))Meaning: Each term must appear in at least one field (can be scattered across different fields).
Usage Limitations
- Field Type Limitation: Only supports text fields, does not support keyword, numeric, date, etc.
- Analyzer Limitation: All fields must use the same search analyzer.
- Similarity Limitation: Only supports BM25 similarity (Elasticsearch's default), does not support custom similarity or per-field similarity settings.
- Clause Count Limitation: The number of query clauses is limited by
indices.query.bool.max_clause_count(default 4096), calculated as "number of fields × number of terms".
Example:
{
"query": {
"combined_fields": {
"query": "quick brown fox jumps",
"fields": ["title", "abstract", "body"]
}
}
}- Number of terms: 4 (quick, brown, fox, jumps).
- Number of fields: 3 (title, abstract, body).
- Clause count: 4 × 3 = 12 (far below the 4096 limit).
4. Match Phrase Query - Phrase Search
Must match the phrase order completely, suitable for searching fixed phrases.
{
"query": {
"match_phrase": {
"content": {
"query": "quick brown fox",
"slop": 1
}
}
}
}Parameter Explanation:
query: The phrase to search for.analyzer: Specifies the analyzer (defaults to the analyzer configured for the field).boost: Adjusts the relevance score weight, default is1.0.slop: Maximum number of intervals allowed between terms, default is0(must be completely adjacent).zero_terms_query: How to handle cases where there are no tokens after analysis (noneorall).
Test Data:
// Document 1
{ "content": "The quick brown fox jumps over the lazy dog" }
// Document 2
{ "content": "A quick and brown fox in the forest" }
// Document 3
{ "content": "The brown quick fox runs fast" }Query Result (slop = 0):
{
"query": {
"match_phrase": {
"content": "quick brown fox"
}
}
}| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ✅ | Term order correct and adjacent |
| Doc 2 | ❌ | "and" in middle, not adjacent |
| Doc 3 | ❌ | Order wrong (brown quick) |
Query Result (slop = 1):
{
"query": {
"match_phrase": {
"content": {
"query": "quick brown fox",
"slop": 1
}
}
}
}| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ✅ | Term order correct and adjacent |
| Doc 2 | ✅ | 1 term in middle ("and"), meets slop = 1 |
| Doc 3 | ❌ | Order wrong, requires 2 moves to match |
5. Term Query - Exact Match
Used for exact value queries, does not perform tokenization, matches terms directly in the index.
{
"query": {
"term": {
"status": {
"value": "published"
}
}
}
}Parameter Explanation:
value: The exact value to query.boost: Adjusts the relevance score weight, default is1.0.case_insensitive: Whether to ignore case, default isfalse(supported since Elasticsearch 7.10+).
Applicable Types:
- Keyword fields: Matches original value exactly.
- Text fields: Matches tokenized terms, not the original text.
- Numeric, Date, Boolean: Exact value match.
Use Cases:
- Exact match for keyword fields (status, tags, IDs, etc.).
- Exact query for numeric, date, boolean values.
- Specific term query for text fields (requires understanding tokenization results).
Test Data:
// Document 1
{ "status": "published", "title": "Elasticsearch Guide" }
// Document 2
{ "status": "draft", "title": "Quick Tutorial" }Query Example (Keyword field):
{
"query": {
"term": {
"status": "published"
}
}
}| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ✅ | status matches "published" exactly |
| Doc 2 | ❌ | status is "draft" |
Query Example (Text field):
Assuming title is a text field using the standard analyzer:
{
"query": {
"term": {
"title": "elasticsearch"
}
}
}| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ✅ | "Elasticsearch Guide" tokenized contains "elasticsearch" |
| Doc 2 | ❌ | Tokenized does not contain "elasticsearch" |
WARNING
When using term query on a text field, the query value is not tokenized, but it will match against the tokenized terms in the index. For example, querying "Elasticsearch Guide" will not match any results because the index stores tokenized "elasticsearch" and "guide", not the full string.
Recommendation: When performing full-text search on text fields, use match query instead of term query.
6. Terms Query - Multi-Value Exact Match
Similar to SQL's IN query.
Basic Usage
{
"query": {
"terms": {
"status": ["published", "draft", "pending"],
"boost": 2.0
}
}
}Parameter Explanation:
boost: Adjusts the relevance score weight.index.max_terms_count: Default maximum 65,536 terms, adjustable via settings.
Test Data:
// Document 1
{ "status": "published", "title": "Article 1" }
// Document 2
{ "status": "draft", "title": "Article 2" }
// Document 3
{ "status": "archived", "title": "Article 3" }Query Result:
| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ✅ | status = "published" |
| Doc 2 | ✅ | status = "draft" |
| Doc 3 | ❌ | status = "archived" not in list |
Terms Lookup - Fetching values from existing documents as search conditions
When you need to search for a large number of terms, you can fetch field values from existing documents as search conditions, avoiding the need to manually list a large number of terms.
Usage Limitations:
- Must enable
_sourcefor the field. - Does not support cross-cluster search.
- Also subject to
index.max_terms_countlimitation (default 65,536).
Parameter Explanation:
index: Name of the index where the source document resides.id: ID of the source document.path: Name of the field to fetch values from, supports dot notation for nested objects.
Example Scenario: Suppose there is an index storing article statuses, and you want to find all other documents that have the same status as a specific document.
Test Data:
// Document 1
{ "status": "published", "title": "Article 1" }
// Document 2
{ "status": "draft", "title": "Article 2" }
// Document 3
{ "status": "archived", "title": "Article 3" }Query: Fetch status field value from document 2 and search for all documents containing these values
{
"query": {
"terms": {
"status": {
"index": "my-index",
"id": "2",
"path": "status"
}
}
}
}Execution Flow:
Elasticsearch fetches the document with ID
2from themy-indexindex.Reads the
statusfield value:["draft"].Uses
["draft"]as the search condition, equivalent to executing:json{ "query": { "terms": { "status": ["draft"] } } }
Query Result:
| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ❌ | status = "published" does not match |
| Doc 2 | ✅ | status = "draft" |
| Doc 3 | ❌ | status = "archived" does not match |
7. Range Query - Range Search
Used for numeric and date range queries.
Basic Usage
{
"query": {
"range": {
"age": {
"gte": 18,
"lte": 65,
"boost": 2.0
}
}
}
}Parameter Explanation:
gt: Greater than.gte: Greater than or equal.lt: Less than.lte: Less than or equal.format: Date format, overrides the default format in the field mapping.relation: Only applicable to range type fields (e.g.,date_range,integer_range, etc.), specifies range matching method:INTERSECTS(default): Intersection match - matches if the query range overlaps with the document range.CONTAINS: Contains match - document range completely contains the query range.WITHIN: Within match - document range is completely within the query range.
time_zone: Time zone setting, used to convert date values to UTC.boost: Adjusts the relevance score weight (default 1.0).
Test Data:
// Document 1
{ "age": 25, "name": "Alice" }
// Document 2
{ "age": 17, "name": "Bob" }
// Document 3
{ "age": 70, "name": "Charlie" }Query Result (age range 18-65):
| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ✅ | 25 is in range |
| Doc 2 | ❌ | 17 < 18 |
| Doc 3 | ❌ | 70 > 65 |
Date Range Query
Basic Date Example:
{
"query": {
"range": {
"created_date": {
"gte": "2024-01-01",
"lte": "2024-12-31",
"format": "yyyy-MM-dd"
}
}
}
}Date Example using Date Math:
{
"query": {
"range": {
"created_date": {
"gte": "now-1d/d",
"lte": "now/d"
}
}
}
}This query returns documents where the created_date field is between yesterday and today.
Date Math Syntax Explanation:
now: Current time (UTC)+1h: Plus 1 hour-1d: Minus 1 day/d: Round to the day (start or end of the day)/M: Round to the month/y: Round to the year
Using Date Math Operator ||:
When a fixed date needs to be combined with date math (e.g., rounding), you must use || to connect them:
{
"query": {
"range": {
"created_date": {
"gte": "2024-01-01||/d", // Use || to connect date and rounding operation
"lte": "2024-12-31||/d"
}
}
}
}Date Math Rounding Rules:
| Operator | Rounding Behavior | Example |
|---|---|---|
gt | Round up to the first millisecond (exclusive) | 2014-11-18||/M → 2014-12-01T00:00:00.000Z |
gte | Round down to the first millisecond (inclusive) | 2014-11-18||/M → 2014-11-01T00:00:00.000Z |
lt | Round down to the last millisecond (exclusive) | 2014-11-18||/M → 2014-10-31T23:59:59.999Z |
lte | Round up to the last millisecond (inclusive) | 2014-11-18||/M → 2014-11-30T23:59:59.999Z |
format Parameter Explanation
Role of the format parameter:
- Overrides the date format defined in the field mapping.
- Specifies the date format for query parameters (
gte,gt,lte,lt).
format Usage Rules:
If the date field does not specify a format.
- Usually supports multiple common date formats.
- Elasticsearch will attempt to parse automatically.
If the index mapping specifies a format.
- Query parameters (
gte,lte, etc.) must match the format defined in the index mapping. - Or override it using the
formatparameter in the query.
- Query parameters (
When using the format parameter.
- All query parameters (
gte,gt,lte,lt) must match the format specified by theformatparameter. - Inconsistent formats will cause the query to fail or produce unexpected results.
- All query parameters (
Example:
// Index mapping definition
{
"mappings": {
"properties": {
"created_date": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss'Z'" // Define format
}
}
}
}
// ✅ Example 1: Query format matches mapping exactly
{
"query": {
"range": {
"created_date": {
"gte": "2024-01-01T00:00:00Z",
"lte": "2024-12-31T23:59:59Z"
}
}
}
}
// ❌ Example 2: Query format does not match mapping (only provides YMD)
// Error: Format mismatch, cannot parse
{
"query": {
"range": {
"created_date": {
"gte": "2024-01-01",
"lte": "2024-12-31"
}
}
}
}
// ✅ Example 3: Use format parameter to override mapping format
{
"query": {
"range": {
"created_date": {
"gte": "2024-01-01",
"lte": "2024-12-31",
"format": "yyyy-MM-dd" // Override mapping format
}
}
}
}
// ❌ Example 4: Query parameter format does not match format parameter
// Error: Query parameter format does not match format parameter
{
"query": {
"range": {
"created_date": {
"gte": "2024-01-01T00:00:00Z", // Contains time
"lte": "2024-12-31T23:59:59Z",
"format": "yyyy-MM-dd" // format only defines YMD
}
}
}
}Time Zone Handling
Using the time_zone parameter:
{
"query": {
"range": {
"timestamp": {
"time_zone": "+01:00",
"gte": "2020-01-01T00:00:00",
"lte": "now"
}
}
}
}Time Zone Conversion Explanation:
- The
time_zoneparameter can use ISO 8601 UTC offset (e.g.,+01:00,-08:00). - It can also use IANA time zone IDs (e.g.,
America/Los_Angeles,Asia/Taipei). - In the example,
2020-01-01T00:00:00uses UTC offset+01:00, which will be converted to2019-12-31T23:00:00 UTC. - Note: The
time_zoneparameter does not affect the value ofnow;nowis always the UTC of the current system time.
Missing Date Components
When the date format is incomplete, Elasticsearch uses the following default values to fill in the gaps (the year will not be replaced):
| Component | Default Value |
|---|---|
MONTH_OF_YEAR | 01 |
DAY_OF_MONTH | 01 |
HOUR_OF_DAY | 23 |
MINUTE_OF_HOUR | 59 |
SECOND_OF_MINUTE | 59 |
NANO_OF_SECOND | 999_999_999 |
Official Documentation Example (Date part):
- If format is
yyyy-MM, andgtvalue is2099-12. - Elasticsearch will convert it to
2099-12-01T23:59:59.999_999_999Z. - Retains the provided year (2099) and month (12).
- Uses default day (01), hour (23), minute (59), second (59), nanosecond (999_999_999).
Actual Test Results (Time part):
The behavior of the time part differs from the official documentation explanation. Actual tests found:
✅ Cases that can be queried successfully:
{
"query": {
"range": {
"created_date": {
"gte": "2023-01-15T08", // Only provided up to the hour
"lte": "2023-01-15T08"
}
}
}
}- Can query data for
2023-01-15T08:30:00Z. - This means Elasticsearch formats both the document and the query parameters to the same precision before comparing.
❌ Cases that cannot be queried:
// Case 1: Using gt and lte
{
"query": {
"range": {
"joined_date": {
"gt": "2023-01-15T08", // Greater than (exclusive)
"lte": "2023-01-15T08"
}
}
}
}
// Case 2: Using gte and lt
{
"query": {
"range": {
"joined_date": {
"gte": "2023-01-15T08",
"lt": "2023-01-15T08" // Less than (exclusive)
}
}
}
}- Both cases cannot query
2023-01-15T08:30:00Z. - Because
gtandltexclude the specified precision range.
Behavior Inference:
- Date part: Follows the official documentation for filling in missing components.
- Time part: Formats the document and query parameters to the same precision, then compares.
- Example:
"2023-01-15T08"treats all data for2023-01-15T08:xx:xxas the same time unit. - Using
gteandltecan include data for the entire hour. - Using
gtorltexcludes that time unit.
- Example:
Recommended Approach:
To avoid unexpected query results due to precision issues, it is recommended to:
Explicitly specify the complete time format.
json{ "query": { "range": { "created_date": { "gte": "2023-01-15T08:00:00Z", "lte": "2023-01-15T08:59:59Z" } } } }Use Date Math rounding functionality.
json{ "query": { "range": { "created_date": { "gte": "2023-01-15T08:00:00Z||/h", // Round to start of hour "lte": "2023-01-15T08:59:59Z||/h" // Round to end of hour } } } }Use gte + lte when querying an entire time unit.
json{ "query": { "range": { "created_date": { "gte": "2023-01-15T08", // Includes start of 08:00:00 "lte": "2023-01-15T08" // Includes end of 08:59:59 } } } }
Numeric vs String Differences
When using range query on a date field, numeric and string parsing methods differ:
// ❌ Error: Numeric values are interpreted as millisecond timestamps
{
"query": {
"range": {
"created_date": {
"gte": 2020 // Interpreted as 1970-01-01T00:00:02.020Z (2020 milliseconds after 1970)
}
}
}
}
// ✅ Correct: Strings are parsed according to format
{
"query": {
"range": {
"created_date": {
"gte": "2020" // Interpreted as 2020-01-01T00:00:00.000Z (Year 2020)
}
}
}
}Pitfalls of mixing numeric and string values:
When gte/gt/lte/lt mix numeric and string values, different results occur:
// ❌ Error: Mixing numeric and date format strings
{
"query": {
"range": {
"created_date": {
"gte": 2022, // Numeric: interpreted as milliseconds
"lte": "2025-01-01" // String: interpreted as date format
}
}
}
}
// Error: String "2025-01-01" cannot be mixed with numeric, format error
// ✅ Correct: Mixing numeric and pure numeric strings
{
"query": {
"range": {
"created_date": {
"gte": 2025, // Numeric: interpreted as milliseconds
"lte": "2025" // Pure numeric string: interpreted as milliseconds
}
}
}
}
// Success: Both are treated as millisecond timestamps
// ✅ Correct: Uniformly use strings
{
"query": {
"range": {
"created_date": {
"gte": "2022", // String: interpreted as year
"lte": "2025-01-01" // String: interpreted as date
}
}
}
}Important Principles:
- It is recommended to uniformly use string format to avoid parsing issues caused by mixing numeric and string values.
- Pure numeric strings (e.g.,
"2025") are treated as millisecond timestamps. - Date format strings (e.g.,
"2025-01-01") are parsed according to format. - Numeric values are always interpreted as millisecond timestamps.
8. Exists Query - Field Existence Query
Queries whether a field exists (is not null).
Positive Query: Query field exists
{
"query": {
"exists": {
"field": "email"
}
}
}Test Data:
// Document 1
{ "name": "Alice", "email": "[email protected]" }
// Document 2
{ "name": "Bob", "email": null }
// Document 3
{ "name": "Charlie" }
// Document 4
{ "name": "David", "email": "" }
// Document 5
{ "name": "Eve", "email": [] }Query Result:
| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ✅ | email field exists and has value |
| Doc 2 | ❌ | email field is null |
| Doc 3 | ❌ | No email field |
| Doc 4 | ✅ | Empty string is still considered existing |
| Doc 5 | ❌ | Empty array is considered non-existent |
Negative Query: Query field does not exist
Use must_not combined with exists to query documents where the field does not exist.
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "email"
}
}
}
}
}Special Case Explanation
In some cases, even if the field value exists in the original JSON document, the exists query will still determine it as "non-existent":
index: falseanddoc_values: false:index: false: Field is not indexed, cannot be used for search queries.doc_values: false: Field does not store doc values, cannot be used for sorting, aggregation, or script access.- When both are set to
false, exists query considers the field non-existent.
Exceeding
ignore_abovesetting: For keyword type fields, if the length of the field value exceeds theignore_abovelimit set in the mapping, the value will not be indexed.json// Mapping setting ignore_above: 10 { "tags": "this_is_too_long" } // Length 15, will not be indexedignore_malformedand format error: When the field type is numeric, date, etc., but the written data format is incorrect, ifignore_malformed: trueis set in the mapping, the value will be ignored and not indexed.json// Mapping setting price as integer type, and ignore_malformed: true { "price": "not_a_number" } // Format error, will not be indexed, but document write succeeds
These settings are mainly used to improve data processing fault tolerance, but be aware that they affect the query results of the exists query.
9. Prefix Query - Prefix Search
Queries documents starting with a specific string.
{
"query": {
"prefix": {
"username": {
"value": "admin"
}
}
}
}Parameter Explanation:
value: Prefix string.boost: Adjusts the relevance score weight.case_insensitive: Whether to ignore case, default isfalse.rewrite: Query rewrite method, used for performance optimization. When a prefix matches a large number of terms, this parameter controls how to handle the matching results. Common values includeconstant_score(default, all matches given the same score),top_terms_N(only takes the top N terms), etc. For detailed explanation, please refer to the official documentation.
Test Data:
// Document 1
{ "username": "admin123" }
// Document 2
{ "username": "administrator" }
// Document 3
{ "username": "user456" }Query Result:
| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ✅ | Starts with "admin" |
| Doc 2 | ✅ | Starts with "admin" |
| Doc 3 | ❌ | Does not start with "admin" |
10. Wildcard Query - Wildcard Search
Uses * and ? for fuzzy search (performance is poor, use with caution).
{
"query": {
"wildcard": {
"username": {
"value": "ad*n?",
"case_insensitive": true
}
}
}
}Wildcard Explanation:
*: Matches zero or more characters.?: Matches a single character.
Parameter Explanation:
value: Query string containing wildcards.wildcard: Alias forvalue, same functionality. When both exist, the last parameter takes precedence.boost: Adjusts the relevance score weight.case_insensitive: Whether to ignore case, default isfalse.rewrite: Query rewrite method.
Comparison of wildcard and value parameters
Test Data:
// Document 1
{ "username": "admin" }
// Document 2
{ "username": "administrator" }
// Document 3
{ "username": "admins" }
// Document 4
{ "username": "user456" }Query Example: Using both wildcard and value
{
"query": {
"wildcard": {
"username": {
"wildcard": "admin",
"value": "ad*n?"
}
}
}
}Parameter Explanation:
wildcard: "admin": Will exactly match "admin".value: "ad*n?": Will match "ad" start + zero or more characters + "n" + single character.
Query Result (Using value: "ad*n?" because it is last):
| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ❌ | "admin" has only 5 characters, does not match "ad*n?" pattern (requires one more character after n) |
| Doc 2 | ✅ | "administrator" matches "ad*n?" pattern |
| Doc 3 | ✅ | "admins" matches "ad*n?" pattern |
| Doc 4 | ❌ | Does not start with "ad" |
If wildcard is last:
{
"query": {
"wildcard": {
"username": {
"value": "ad*n?",
"wildcard": "admin"
}
}
}
}Query Result (Using wildcard: "admin" because it is last):
| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ✅ | Exactly matches "admin" |
| Doc 2 | ❌ | Not an exact match for "admin" |
| Doc 3 | ❌ | Not an exact match for "admin" |
| Doc 4 | ❌ | Not an exact match for "admin" |
Performance Notes:
- Avoid using leading wildcards (e.g.,
*termor?term), which will lead to full table scans. - Wildcard queries have no caching mechanism, performance is poor.
11. Regexp Query - Regular Expression Search
Uses regular expressions for complex matching (performance is worst, use with caution).
{
"query": {
"regexp": {
"phone": {
"value": "09[0-9]{8}"
}
}
}
}Parameter Explanation:
value: Regular expression pattern.flags: Regular expression flags (e.g.,COMPLEMENT,INTERVAL), used to enable additional operators.case_insensitive: Whether to ignore case, default isfalse.max_determinized_states: Maximum number of states, default is10000. This parameter is used to limit the complexity of the regular expression engine, preventing overly complex regular expressions from causing performance issues or memory exhaustion. An exception is thrown when the regular expression is too complex.rewrite: Query rewrite method.
Test Data:
// Document 1
{ "phone": "0912345678" }
// Document 2
{ "phone": "0987654321" }
// Document 3
{ "phone": "02-12345678" }Query Result (Querying "09[0-9]{8}"):
| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ✅ | Matches 09 start + 8 digits |
| Doc 2 | ✅ | Matches 09 start + 8 digits |
| Doc 3 | ❌ | Format does not match |
Flags Parameter Explanation and Examples
The flags parameter is used to enable additional operators for the Lucene regular expression engine. The following uses the same test data to demonstrate the effects of different flags.
Note: These symbols (~, #, <>, &, @) are Lucene-specific extensions, not standard general-purpose regular expression syntax.
Test Data:
// Document 1
{ "code": "abc123" }
// Document 2
{ "code": "abc456" }
// Document 3
{ "code": "xyz789" }
// Document 4
{ "code": "def123" }
// Document 5
{ "code": "abc" }1. COMPLEMENT - Negation Pattern
Uses the ~ operator to negate the subsequent pattern.
{
"query": {
"regexp": {
"code": {
"value": "abc~123",
"flags": "COMPLEMENT"
}
}
}
}Query Result:
| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ❌ | "abc123" contains the negated "123" |
| Doc 2 | ✅ | "abc456" matches "abc" followed by something that is not "123" |
| Doc 3 | ❌ | Does not start with "abc" |
| Doc 4 | ❌ | Does not start with "abc" |
| Doc 5 | ✅ | "abc" followed by nothing, which is not "123" |
Usage Notes for text fields:
Be particularly careful about the impact of tokenization when using ~ negation on text fields. For example:
// Assume name field is text type
// Data: { "name": "Wing Chou" }
// Query
{
"query": {
"regexp": {
"name": {
"value": "~(wing)",
"flags": "COMPLEMENT"
}
}
}
}At first glance, it might seem this query would exclude "Wing Chou", but in reality:
- "Wing Chou" becomes
["wing", "chou"]after tokenization. ~(wing)negates "wing", but "chou" still matches.- Therefore, "Wing Chou" will still appear in the query results.
It is recommended to use negation operators on keyword fields to avoid unexpected results caused by tokenization.
2. INTERVAL - Numeric Range
Uses the <> operator to match numeric ranges.
{
"query": {
"regexp": {
"code": {
"value": "abc<100-200>",
"flags": "INTERVAL"
}
}
}
}Query Result:
| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ✅ | "abc123" matches abc + number in 100-200 range |
| Doc 2 | ❌ | 456 in "abc456" is out of range |
| Doc 3 | ❌ | Does not start with "abc" |
| Doc 4 | ❌ | Does not start with "abc" |
| Doc 5 | ❌ | No number after "abc" |
3. INTERSECTION - AND Operation
Uses the & operator to match strings that match both patterns simultaneously.
{
"query": {
"regexp": {
"code": {
"value": "abc.+&.+123",
"flags": "INTERSECTION"
}
}
}
}Query Result:
| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ✅ | "abc123" matches both "starts with abc" and "ends with 123" |
| Doc 2 | ❌ | "abc456" does not match "ends with 123" |
| Doc 3 | ❌ | "xyz789" does not match "starts with abc" |
| Doc 4 | ❌ | "def123" does not match "starts with abc" |
| Doc 5 | ❌ | "abc" does not match "ends with 123" |
4. ANYSTRING - Match Any String
Uses the @ operator to match any entire string.
Official Example (combined with exclusion logic):
{
"query": {
"regexp": {
"code": {
"value": "@&~(abc.+)",
"flags": "ANYSTRING|INTERSECTION|COMPLEMENT"
}
}
}
}This example matches all strings that do not start with "abc".
Note: I cannot understand the actual difference between @&~(abc.+) and simply using ~(abc.+). If you need to use this operator, it is recommended to refer to the official documentation or perform actual tests to confirm the behavior.
5. EMPTY - Match No String
Uses the # operator to represent "matches no string", not even an empty string.
Difference from empty string:
// Empty string matches empty data
// ✅ Matches data where code field is empty string
{
"query": {
"regexp": {
"code": {
"value": ""
}
}
}
}
// # matches no data
// ❌ Matches no data (including empty string)
{
"query": {
"regexp": {
"code": {
"value": "#",
"flags": "EMPTY"
}
}
}
}Actual Use Case (.NET Example):
Mainly used when dynamically combining regular expressions in code to avoid accidentally matching empty string data when there are no query conditions.
// .NET dynamic combination query condition example
List<string> conditions = new();
if (searchByAbc) {
conditions.Add("abc.*");
}
if (searchByXyz) {
conditions.Add("xyz.*");
}
// Use # to avoid matching empty string when no conditions exist
string pattern = conditions.Count > 0
? string.Join("|", conditions) // "abc.*|xyz.*"
: "#"; // Ensure no data is matched
SearchRequest searchRequest = new() {
Query = new RegexpQuery {
Field = "code",
Value = pattern,
Flags = conditions.Count > 0 ? "ALL" : "EMPTY"
}
};Notes:
# is a special Lucene operator and cannot be used to match the literal "#" character.
// ❌ Error: Cannot be used to query data containing "#" character
// Query data { "code": "#" } → Cannot find
{
"query": {
"regexp": {
"code": {
"value": "#",
"flags": "EMPTY"
}
}
}
}
// ❌ Error: Cannot be used to query data containing "#" character
// Query data { "code": "#1" } → Cannot find
{
"query": {
"regexp": {
"code": {
"value": "#1",
"flags": "EMPTY"
}
}
}
}To match the literal "#" character, you need to use a backslash escape (see "Special Character Escaping" section below).
6. Combining Multiple Flags
You can use the | delimiter to enable multiple operators simultaneously.
{
"query": {
"regexp": {
"code": {
"value": "abc<100-500>",
"flags": "COMPLEMENT|INTERVAL"
}
}
}
}Flag Support Options:
ALL(default): Enables all optional operators.NONE: Disables all optional operators.COMPLEMENT: Enables~negation operator.INTERVAL: Enables<>range operator.INTERSECTION: Enables&AND operator.ANYSTRING: Enables@any string operator.EMPTY: Enables#empty language operator (matches no string).
Special Character Escaping
In the Lucene regular expression engine, the following characters have special meanings. If you want to use them as ordinary characters, you need to escape them with a backslash \:
Reserved Characters:
. ? + * | { } [ ] ( ) " \ #Escaping Example:
// ❌ Error: + is a special character
// Query data { "phone": "+886912345678" } → Cannot find
{
"query": {
"regexp": {
"phone": {
"value": "+886.*"
}
}
}
}
// ✅ Correct: Use backslash to escape
// Query data { "phone": "+886912345678" } → Can find
{
"query": {
"regexp": {
"phone": {
"value": "\\+886.*"
}
}
}
}Notes:
Because the backslash itself needs to be escaped in JSON strings, you need to use a double backslash \\ in JSON queries.
// Need to write "\\" in JSON to represent a single backslash
{ "value": "\\+886.*" } // Actual regular expression is "\+886.*"Anchor Operator Limitations
Lucene's regular expression engine does not support anchor operators, such as
^(beginning of line) or$(end of line). To match a term, the regular expression must match the entire string.
Lucene's regular expression engine does not support anchor operators, such as ^ (beginning of line) or $ (end of line). To match a term, the regular expression must match the entire string.
This means:
^and$do not have the special meaning of anchors.- Regular expressions match the entire field value by default (equivalent to already having anchor effects).
- Based on tests,
^and$are treated as ordinary characters, not anchor operators (using them will result in no data found).
Example:
// ✅ Correct: Match pattern directly
{ "value": "abc.*" } // Matches full string starting with abc
// ❌ Not recommended: Cannot find data for abc, inferred that it should try to match ^abc and abc$
{ "value": "^abc" }
{ "value": "abc$" }Performance Notes:
- Regular expression query performance is extremely poor, should be avoided as much as possible.
- Consider using other query methods (e.g.,
prefix,wildcard) instead. - If you must use it, limit the query scope and set a reasonable
max_determinized_states. - Avoid overly complex regular expressions to prevent triggering the
max_determinized_stateslimit.
13. Fuzzy Query - Fuzzy Search
Error-tolerant query, allows spelling errors. Can be used for text and keyword fields.
Text field example:
{
"query": {
"fuzzy": {
"name": {
"value": "wing",
"fuzziness": "AUTO"
}
}
}
}Effect: Queries terms with edit distance within the allowed range from wing.
Example:
wing✓ (Exact match).wang✓ (1 char difference: a).weng✓ (1 char difference: e).king✗ (Too much difference).
Note: Because text fields are processed by an analyzer (tokenization, lowercase), so:
- Index:
Wing Chou→ After tokenization becomes[wing, chou](lowercased, tokenized). - Query:
wing→ Can match the termwing.
Parameter Explanation:
value: Term to query (required).fuzziness: Allowed edit distance (AUTO,0,1,2), recommended to useAUTO.AUTO: Automatically determines edit distance based on term length.0: No errors allowed (equivalent to term query).1: Allows 1 character difference.2: Allows 2 character difference.
prefix_length: First N characters must match exactly, default is0.max_expansions: Maximum number of candidate terms to expand, default is50.transpositions: Whether to allow adjacent character swaps (e.g., ab → ba), default istrue.
Complete Example:
{
"query": {
"fuzzy": {
"title": {
"value": "quikc",
"fuzziness": "AUTO",
"prefix_length": 2,
"max_expansions": 10,
"transpositions": true
}
}
}
}Parameter Effects:
prefix_length = 2 (First 2 characters must match):
quick✓ (Starts with qu, matches prefix).quikc✓ (Starts with qu, matches prefix).xuick✗ (Starts with xu, does not match prefix qu).
max_expansions = 10 (Max 10 candidate terms to expand):
Assuming the index has 20+ similar terms (quick, quit, quiz, quiet, quiche...), Elasticsearch will only take the first 10 candidate terms for searching, ignoring the rest.
Purpose: Limiting the expansion count can improve query performance, avoiding resource consumption from too many candidates.
transpositions = true (Allows adjacent character swaps):
qiuck✓ (ui ↔ iu, swap counts as 1 edit).qukic✓ (ki ↔ ik, swap counts as 1 edit).
transpositions = false (Does not allow swaps):
{
"query": {
"fuzzy": {
"title": {
"value": "qiuck",
"fuzziness": 1,
"transpositions": false
}
}
}
}qiuck✗ (ui ↔ iu not allowed, requires 2 edits: delete i, insert u).quick✓ (Requires only 1 edit: replace i → u).
Keyword field example:
{
"query": {
"fuzzy": {
"name.keyword": {
"value": "Wing Chow",
"fuzziness": "AUTO"
}
}
}
}Effect: Performs fuzzy matching on the complete keyword value.
Example:
Wing Chou✓ (1 char difference: w → u).Wing Chow✓ (Exact match).Wing Zhou✓ (2 char difference).John Wang✗ (Too much difference).
Usage Recommendations:
For text fields:
- Recommended to use
matchquery combined withfuzzinessparameter, rather than usingfuzzyquery directly. - Reason:
matchquery is processed by the analyzer (tokenization, lowercase, etc.), which better fits actual search requirements.
Example Comparison:
Scenario: Index contains document name = "Wing Chou" (text field)
→ After analyzer processing, the terms in the index are: ["wing", "chou"] (lowercased, tokenized)
Example 1: fuzziness = 0 (Must match exactly)
// Not recommended: Use fuzzy directly (text field)
{
"query": {
"fuzzy": {
"name": {
"value": "Wing", // Does not pass through analyzer, matches "Wing" directly
"fuzziness": 0
}
}
}
}- Query term:
Wing(uppercase W). - Index term:
wing(lowercase w). - fuzziness = 0 means it must match exactly.
- Result: ✗ Cannot find (
Wing≠wing, case differs).
// Recommended: Use match (text field)
{
"query": {
"match": {
"name": {
"query": "Wing", // Passes through analyzer, becomes "wing"
"fuzziness": 0
}
}
}
}- Query term:
Wing→ Passes through analyzer →wing(lowercase). - Index term:
wing(lowercase). - fuzziness = 0 means it must match exactly.
- Result: ✓ Can find (matches exactly).
Example 2: fuzziness = 1 (Allows 1 character difference)
// Not recommended: Use fuzzy directly (text field)
{
"query": {
"fuzzy": {
"name": {
"value": "wing chuo", // Does not tokenize, queries "wing chuo" as a complete term
"fuzziness": 1
}
}
}
}- Query term:
wing chuo(complete string). - Index terms:
wing,chou(tokenized). - Result: ✗ Cannot find (index does not have "wing chuo" as a complete term).
// Recommended: Use match + fuzziness (text field)
{
"query": {
"match": {
"name": {
"query": "wing chuo", // Tokenizes into ["wing", "chou"], and performs fuzzy match on each term
"fuzziness": 1
}
}
}
}- Query term:
wing chuo→ Passes through analyzer →["wing", "chou"]. - Index terms:
wing,chou. - Result: ✓ Can find (
wingmatches exactly,chuodiffers fromchouby 1 character).
For keyword fields:
- Can use
fuzzyquery directly. - Because keyword fields are not processed by an analyzer, fuzzy matching against the complete value is reasonable.
Summary of Usage Timing:
- Text fields: Prioritize using
match+fuzziness. - Keyword fields: Can use
fuzzyquery. - Need to match terms directly (no analysis needed): Use
fuzzyquery.
Edit Distance Explanation:
Edit distance (Levenshtein Distance) refers to the minimum number of operations required to convert one string into another. Allowed operations include:
- Insert a character:
quic→quick(insert k). - Delete a character:
quickk→quick(delete k). - Replace a character:
quikc→quick(replace k → c). - Swap adjacent characters (requires
transpositions = true):qiuck→quick(swap iu).
For detailed fuzziness parameter explanation, please refer to the "Match Query" section.
14. IDs Query - Query by Document ID
Queries directly by document _id.
{
"query": {
"ids": {
"values": ["1", "2", "3"]
}
}
}Use Cases:
- Querying by known document IDs.
- Batch querying specific documents.
- Used in combination with other queries.
15. Nested Query - Nested Object Query
Used for querying nested type fields. Can only be used for nested types, not object types. Can preserve the relationships between fields within array elements.
Mapping Definition:
{
"mappings": {
"properties": {
"title": { "type": "text" },
"comments": {
"type": "nested",
"properties": {
"author": { "type": "keyword" },
"rating": { "type": "integer" },
"text": { "type": "text" }
}
}
}
}
}Basic Query:
{
"query": {
"nested": {
"path": "comments",
"query": {
"bool": {
"must": [
{ "match": { "comments.author": "John" }},
{ "range": { "comments.rating": { "gte": 4 }}}
]
}
}
}
}
}Parameter Explanation:
path: Path to the nested object (required).query: Query to execute within the nested object (required).score_mode: How to calculate the score of the nested object, default isavg.avg: Average score (default).sum: Sum.max: Maximum score.min: Minimum score.none: Do not calculate score (set to 0).
ignore_unmapped: Whether to ignore errors if the field does not exist, default isfalse.
Test Data:
// Document 1
{
"title": "Product A",
"comments": [
{ "author": "John", "rating": 5, "text": "Great!" },
{ "author": "Jane", "rating": 3, "text": "OK" }
]
}
// Document 2
{
"title": "Product B",
"comments": [
{ "author": "John", "rating": 2, "text": "Poor" },
{ "author": "Bob", "rating": 5, "text": "Excellent" }
]
}Query Result (author = "John" AND rating >= 4):
| Document | Matches? | Explanation |
|---|---|---|
| Doc 1 | ✅ | John's rating is 5 (>= 4) |
| Doc 2 | ❌ | John's rating is 2 (< 4) |
Why is Nested Query needed?
Problem: object type flattens arrays
If comments is an object type (default), Elasticsearch flattens the array, losing the relationships between elements:
// Original data
{
"title": "Product A",
"comments": [
{ "author": "John", "rating": 5 },
{ "author": "Jane", "rating": 3 }
]
}
// After flattening (relationship lost)
{
"title": "Product A",
"comments.author": ["John", "Jane"],
"comments.rating": [5, 3]
}Example: Incorrect query result (using object type)
Querying products where "John gave 3 points":
{
"query": {
"bool": {
"must": [
{ "term": { "comments.author": "John" }},
{ "term": { "comments.rating": 3 }}
]
}
}
}Result: ✓ Will find Document 1 (❌ but this is wrong! John gave 5 points, not 3)
Reason: Elasticsearch only knows author has "John" and rating has 3, but doesn't know "John corresponds to 5 points".
Solution: Use nested type + nested query
Define comments as a nested type, and Elasticsearch will internally store each array element as an independent sub-document (but it remains one document to the user):
// What you see: one document
{
"title": "Product A",
"comments": [
{ "author": "John", "rating": 5 },
{ "author": "Jane", "rating": 3 }
]
}
// Elasticsearch internal storage structure (hidden, user cannot see):
// ├─ Main document: { "title": "Product A" }
// ├─ Sub-document 1: { "author": "John", "rating": 5 }
// └─ Sub-document 2: { "author": "Jane", "rating": 3 }Key points:
- To you, it is still one document.
- Elasticsearch internally handles sub-document relationships automatically.
- When querying, use nested query to ensure conditions are matched "within the same sub-document".
Querying products where "John gave 3 points":
{
"query": {
"nested": {
"path": "comments",
"query": {
"bool": {
"must": [
{ "term": { "comments.author": "John" }},
{ "term": { "comments.rating": 3 }}
]
}
}
}
}
}Result: ✗ Cannot find (✓ Correct! John gave 5 points, not 3)
score_mode parameter example:
When multiple nested objects in a document match the query, score_mode determines how to calculate the document's final score.
Test Data:
// Document 1
{
"title": "Product A",
"comments": [
{ "author": "Alice", "rating": 5, "text": "Excellent" },
{ "author": "Bob", "rating": 4, "text": "Good" },
{ "author": "Charlie", "rating": 3, "text": "Average" }
]
}
// Document 2
{
"title": "Product B",
"comments": [
{ "author": "David", "rating": 5, "text": "Perfect" }
]
}Query:
{
"query": {
"nested": {
"path": "comments",
"score_mode": "max",
"query": {
"range": { "comments.rating": { "gte": 3 }}
}
}
}
}Result Comparison (assuming each matching comment has a score of 1.0):
| Document | Number of matching comments | max | avg | sum | min |
|---|---|---|---|---|---|
| Doc 1 | 3 | 1.0 | 1.0 | 3.0 | 1.0 |
| Doc 2 | 1 | 1.0 | 1.0 | 1.0 | 1.0 |
Explanation:
- When using
sum, Document 1's score will be higher (because there are 3 matching comments). - When using
maxoravg, the scores for both documents are the same. - This affects sorting results.
Purpose:
- Using
sumallows documents with "more matching comments" to be sorted higher. - Using
maxonly considers the "most relevant comment".
Advanced: Using inner_hits to fetch matching nested objects
Sometimes you don't just want to know "which document matches", but also "which nested object within the document matches".
{
"query": {
"nested": {
"path": "comments",
"query": {
"bool": {
"must": [
{ "term": { "comments.author": "John" }},
{ "range": { "comments.rating": { "gte": 4 }}}
]
}
},
"inner_hits": {}
}
}
}Explanation:
inner_hitsis an object type parameter.- Using an empty object
{}means using default settings. inner_hitssupports various parameters (e.g.,size,from,_source, etc.), but they are outside the scope of this note.
Return Result:
{
"hits": {
"hits": [
{
"_source": {
"title": "Product A",
"comments": [
{ "author": "John", "rating": 5, "text": "Great!" },
{ "author": "Jane", "rating": 3, "text": "OK" }
]
},
"inner_hits": {
"comments": {
"hits": {
"hits": [
{
"_source": {
"author": "John",
"rating": 5,
"text": "Great!"
}
}
]
}
}
}
}
]
}
}Purpose: You can clearly see specifically which comment matches the condition, rather than the entire comments array.
object vs nested quick comparison:
| Feature | object (default) | nested |
|---|---|---|
| Array handling | Flattened (relationship lost) | Maintained independently (relationship maintained) |
| Query method | General query (match, term, bool...) | Must use nested query |
| Use case | Single object or array not requiring relationships | Requires maintaining array element relationships |
| Performance | Better | Worse (extra overhead) |
Usage Recommendations:
Use nested when:
- The field is an array.
- You need to query multiple conditions "within the same array element".
- You need to maintain relationships between array elements.
Example Scenarios:
- Order product list (product name + price must correspond).
- Employee project experience (project name + role must correspond).
- Product reviews (reviewer + rating must correspond).
Use object when:
- The field is not an array.
- Relationships between array elements do not need to be maintained.
- You are pursuing better query performance.
Change Log
- 2025-11-04 Initial document creation.
